Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
نویسندگان
چکیده
The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems.
منابع مشابه
Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability
Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set. Therefore, developing a machine for p...
متن کاملFault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods
Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...
متن کاملSimulation of Boiling in a Vertical Channel Using Ensemble Average Model
Simulation of turbulence boiling, generation of vapour and predication of its behaviour are still subject to debate in the two-phase flow area and they receive a high level of worldwide attention. In this study, a new arrangement of the three dimensional governing equations for turbulence two-phase flow with heat and mass transfer are derived by using ensemble averaging two-fluid model and ...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملاستفاده از POD در استخراج ساختارهای متجانس یک میدان آشفته آماری- همگن
Capability of the Proper Orthogonal Decomposition (POD) method in extraction of the coherent structures from a spatio-temporal chaotic field is assessed in this paper. As the chaotic field, an ensemble of 40 snapshots, obtained from Direct Numerical Simulation (DNS) of the Kuramoto-Sivashinsky (KS) equation, has been used. Contrary to the usual methods, where the ergodicity of the field is need...
متن کامل